{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "7H3yTncQfoym"
      },
      "source": [
        "##### Copyright 2019 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "h1CiDh7CfqON"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "pp-UaomMQJNo"
      },
      "source": [
        "# Vanilla Autoencoder\n",
        "\n",
        "_Notebook orignially contributed by: [afagarap](https://github.com/afagarap)_\n",
        "\n",
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github.com/tensorflow/examples/blob/master/community/en/autoencoder.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/examples/tree/master/community/en/autoencoder.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "NBhbug0Qgmuu"
      },
      "source": [
        "## Overview\n",
        "In this notebook, we will create a **vanilla autoencoder** model using the [TensorFlow subclassing API](https://www.tensorflow.org/guide/keras#model_subclassing). We are going to use the popular [MNIST dataset](http://yann.lecun.com/exdb/mnist/) (Grayscale images of hand-written digits from 0 to 9).\n",
        "\n",
        "We deal with huge amount of data in machine learning which naturally leads to more computations. However, we can pick the parts of the data which contribute the most to a model's learning, thus leading to less computations. The process of choosing the _important_ parts of data is known as _feature selection_, which is among the number of use cases of an _autoencoder_.\n",
        "\n",
        "But what exactly is an autoencoder? Well, let's first recall that a neural network is a computational model that is used for finding a function describing the relationship between data features $x$ and its values or labels $y$, i.e. $y = f(x)$. \n",
        "\n",
        "Now, an autoencoder is also a neural network. But instead of finding the function _mapping the features_ $x$ to their _corresponding values or labels_ $y$, it aims to find the function mapping the _features_ $x$ _to itself_ $x$. Wait, what? Why would we do that?\n",
        "\n",
        "Well, what's interesting is what happens inside the autoencoder."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "zK2SZnszhSYk"
      },
      "source": [
        "## Setup\n",
        "Let's start by importing the libraries and functions that we will need."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "S1sepk9uMddm"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "import tensorflow as tf\n",
        "assert tf.__version__.startswith('2')\n",
        "\n",
        "from tensorflow.keras.datasets import mnist\n",
        "\n",
        "print('TensorFlow version:', tf.__version__)\n",
        "print('Is Executing Eagerly?', tf.executing_eagerly())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "CH3QqWQ1RiP2"
      },
      "source": [
        "## Autoencoder model\n",
        "\n",
        "An autoencoder consists of two components: (1) an **encoder** which learns the data representation $z$, i.e. the important features of a given data $x$ (I like to describe it as _what makes something something_), and (2) a **decoder** which reconstructs the data $\\hat{x}$ based on its idea $z$ of how it is structured.\n",
        "$$ z = f\\big(h_{e}(x)\\big)$$\n",
        "$$ \\hat{x} = f\\big(h_{d}(z)\\big)$$\n",
        "where $z$ is the learned data representation by encoder $h_{e}$, and $\\hat{x}$ is the reconstructed data by decoder $h_{d}$ based on $z$.\n",
        "\n",
        "Let's further dissect the model below."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "fiYJ9BLW-w40"
      },
      "source": [
        "### Define an encoder layer\n",
        "\n",
        "The first component, the **encoder**, is similar to a conventional feed-forward network. However, it is not tasked on predicting values (a _regression_ task) or categories (a _classification_ task). Instead, it is tasked to learn how the data is structured, i.e. data representation $z$. We can implement the encoder layer as follows,"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "fdl5_0Z4-w41"
      },
      "outputs": [],
      "source": [
        "class Encoder(tf.keras.layers.Layer):\n",
        "    def __init__(self, intermediate_dim):\n",
        "        super(Encoder, self).__init__()\n",
        "        self.hidden_layer = tf.keras.layers.Dense(units=intermediate_dim, activation=tf.nn.relu)\n",
        "        self.output_layer = tf.keras.layers.Dense(units=intermediate_dim, activation=tf.nn.relu)\n",
        "    \n",
        "    def call(self, input_features):\n",
        "        activation = self.hidden_layer(input_features)\n",
        "        return self.output_layer(activation)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "7WgR-V0--w44"
      },
      "source": [
        "The _encoding_ is done by passing data input $x$ to the encoder's hidden layer $h$ in order to learn the data representation $z = f(h(x))$.\n",
        "\n",
        "We first create an `Encoder` class that inherits the [`tf.keras.layers.Layer`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer) class to define it as a layer. So, why a layer instead of a model? Recall that the encoder is a _component_ of the autoencoder model.\n",
        "\n",
        "Analyzing the code, the `Encoder` layer is defined to have a single hidden layer of neurons (`self.hidden_layer`) to learn the input features. Then, we connect the hidden layer to a layer (`self.output_layer`) that encodes the learned activations to lower dimension which consists of what it thinks as important features. Hence, the \"output\" of the `Encoder` layer is the _what makes something something_ $z$ of the data $x$."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "W9me1ZLq-w45"
      },
      "source": [
        "### Define a decoder layer\n",
        "\n",
        "The second component, the **decoder**, is also similar to a feed-forward network. However, instead of reducing data to lower dimension, it attempts to reverse the process, i.e. reconstruct the data $\\hat{x}$ from its lower dimension representation $z$ to its original dimension.\n",
        "\n",
        "The _decoding_ is done by passing the lower dimension representation $z$ to the decoder's hidden layer $h$ in order to reconstruct the data to its original dimension $\\hat{x} = f(h(z))$. We can implement the decoder layer as follows,"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "m3za66lwMjWX"
      },
      "outputs": [],
      "source": [
        "class Decoder(tf.keras.layers.Layer):\n",
        "    def __init__(self, intermediate_dim, original_dim):\n",
        "        super(Decoder, self).__init__()\n",
        "        self.hidden_layer = tf.keras.layers.Dense(units=intermediate_dim, activation=tf.nn.relu)\n",
        "        self.output_layer = tf.keras.layers.Dense(units=original_dim, activation=tf.nn.relu)\n",
        "  \n",
        "    def call(self, code):\n",
        "        activation = self.hidden_layer(code)\n",
        "        return self.output_layer(activation)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "SU-QtEUo-w4-"
      },
      "source": [
        "We now create a `Decoder` class that also inherits the `tf.keras.layers.Layer`.\n",
        "\n",
        "The `Decoder` layer is also defined to have a single hidden layer of neurons to reconstruct the input features $\\hat{x}$ from the learned representation $z$ by the encoder $f\\big(h_{e}(x)\\big)$. Then, we connect its hidden layer to a layer that decodes the data representation from lower dimension $z$ to its original dimension $\\hat{x}$. Hence, the \"output\" of the `Decoder` layer is the reconstructed data $\\hat{x}$ from the data representation $z$.\n",
        "\n",
        "Ultimately, the output of the decoder is the autoencoder's output.\n",
        "\n",
        "Now that we have defined the components of our autoencoder, we can finally build our model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "uZRIHpkl-w5A"
      },
      "source": [
        "### Build the autoencoder model\n",
        "\n",
        "We can now build the autoencoder model by instantiating `Encoder` and `Decoder` layers."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "Rl5HUez7-w5C"
      },
      "outputs": [],
      "source": [
        "class Autoencoder(tf.keras.Model):\n",
        "  def __init__(self, intermediate_dim, original_dim):\n",
        "    super(Autoencoder, self).__init__()\n",
        "    self.loss = []\n",
        "    self.encoder = Encoder(intermediate_dim=intermediate_dim)\n",
        "    self.decoder = Decoder(intermediate_dim=intermediate_dim, original_dim=original_dim)\n",
        "\n",
        "  def call(self, input_features):\n",
        "    code = self.encoder(input_features)\n",
        "    reconstructed = self.decoder(code)\n",
        "    return reconstructed"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "25zYoLju-w5G"
      },
      "source": [
        "As discussed above, the encoder's output is the input to the decoder, as it is written above (`reconstructed = self.decoder(code)`)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "X9NBnWhSnvn4"
      },
      "source": [
        "## Reconstruction error\n",
        "\n",
        "We only discussed and built the model, but we talked about how it actually learns. All we know up to this point is the _flow of learning_ from the input layer of the encoder which supposedly learns the data representation, and use that representation as input to the decoder that reconstructs the original data.\n",
        "Like \"simple\" neural networks, an autoencoder learns through [backpropagation](https://www.youtube.com/watch?v=LOc_y67AzCA). However, instead of comparing the values or labels of the model, we compare the reconstructed data and the original data. Let's call this comparison the reconstruction error function, and it is given by the following equation,\n",
        "$$ L = \\dfrac{1}{n} \\sum_{i=0}^{n-1} \\big(\\hat{x}_{i} - x_{i}\\big)^{2}$$\n",
        "where $\\hat{x}$ is the reconstructed data while $x$ is the original data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "wmWZG-1qLP8p"
      },
      "outputs": [],
      "source": [
        "def loss(preds, real):\n",
        "  return tf.reduce_mean(tf.square(tf.subtract(preds, real)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "KVjT6C2SqmH0"
      },
      "source": [
        "## Forward pass and optimization\n",
        "\n",
        "We will write a function for computing the forward pass, and applying a chosen optimization function."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "KNSJqjY7qnCe"
      },
      "outputs": [],
      "source": [
        "def train(loss, model, opt, original):\n",
        "  with tf.GradientTape() as tape:\n",
        "    preds = model(original)\n",
        "    reconstruction_error = loss(preds, original)\n",
        "  gradients = tape.gradient(reconstruction_error, model.trainable_variables)\n",
        "  gradient_variables = zip(gradients, model.trainable_variables)\n",
        "  opt.apply_gradients(gradient_variables)\n",
        "  \n",
        "  return reconstruction_error"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8c4bNlGLrd5T"
      },
      "source": [
        "## The training loop\n",
        "\n",
        "Finally, we will write a function to run the training loop. This function will take arguments for the model, the optimization function, the loss, the dataset, and the training epochs.\n",
        "\n",
        "The training loop itself uses a `GradientTape` context defined in `train` for each batch."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "K8Sh1UaQrc5D"
      },
      "outputs": [],
      "source": [
        "def train_loop(model, opt, loss, dataset, epochs):\n",
        "  for epoch in range(epochs):\n",
        "    epoch_loss = 0\n",
        "    for step, batch_features in enumerate(dataset):\n",
        "      loss_values = train(loss, model, opt, batch_features)\n",
        "      epoch_loss += loss_values\n",
        "    model.loss.append(epoch_loss)\n",
        "    print('Epoch {}/{}. Loss: {}'.format(epoch + 1, epochs, epoch_loss.numpy()))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8eCxbz9ZSwjr"
      },
      "source": [
        "## Process the dataset\n",
        "\n",
        "Now that we have defined our `Autoencoder` class, the loss function, and the training loop, let's import the dataset. We will normalize the pixel values for each example through dividing by maximum pixel value. We shall flatten the examples from 28 by 28 arrays to 784-dimensional vectors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "eAN1ONp6MvI7"
      },
      "outputs": [],
      "source": [
        "(x_train, _), (x_test, _) = mnist.load_data()\n",
        "\n",
        "x_train = x_train / 255.\n",
        "\n",
        "x_train = x_train.astype(np.float32)\n",
        "x_train = np.reshape(x_train, (x_train.shape[0], 784))\n",
        "x_test = np.reshape(x_test, (x_test.shape[0], 784))\n",
        "\n",
        "training_dataset = tf.data.Dataset.from_tensor_slices(x_train).batch(256)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "268qdJGGTULP"
      },
      "source": [
        "## Train the model\n",
        "\n",
        "Now all we have to do is instantiate the autoencoder model and choose an optimization function, then pass the intermediate dimension and the original dimension of the images."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "t8nw7mdKMxvb"
      },
      "outputs": [],
      "source": [
        "model = Autoencoder(intermediate_dim=128, original_dim=784)\n",
        "opt = tf.keras.optimizers.Adam(learning_rate=1e-2)\n",
        "\n",
        "train_loop(model, opt, loss, training_dataset, 20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "VioflTOhTnwl"
      },
      "source": [
        "## Plot the in-training performance\n",
        "\n",
        "Let's take a look at how the model performed during training in a couple of plots."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "azgmhikhM0EE"
      },
      "outputs": [],
      "source": [
        "plt.plot(range(20), model.loss)\n",
        "plt.xlabel('Epochs')\n",
        "plt.ylabel('Loss')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "JccKWvNYTtUW"
      },
      "source": [
        "## Predictions\n",
        "\n",
        "Finally, we will look at some of the predictions. The wrong predictions are labeled in red color."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "-JwpigAlM2dF"
      },
      "outputs": [],
      "source": [
        "number = 10  # how many digits we will display\n",
        "plt.figure(figsize=(20, 4))\n",
        "for index in range(number):\n",
        "    # display original\n",
        "    ax = plt.subplot(2, number, index + 1)\n",
        "    plt.imshow(x_test[index].reshape(28, 28))\n",
        "    plt.gray()\n",
        "    ax.get_xaxis().set_visible(False)\n",
        "    ax.get_yaxis().set_visible(False)\n",
        "\n",
        "    # display reconstruction\n",
        "    ax = plt.subplot(2, number, index + 1 + number)\n",
        "    plt.imshow(model(x_test)[index].numpy().reshape(28, 28))\n",
        "    plt.gray()\n",
        "    ax.get_xaxis().set_visible(False)\n",
        "    ax.get_yaxis().set_visible(False)\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "o4rkBaWN-w5r"
      },
      "source": [
        "## Closing remarks\n",
        "\n",
        "As you may see after training this model, the reconstructed images are quite blurry. A number of things could be done to move forward from this point, e.g. adding more layers, or using a convolutional neural network architecture as the basis of the autoencoder, or use a different kind of autoencoder.\n",
        "\n",
        "## References\n",
        "* Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from [tensorflow.org](https://tensorflow.org/).\n",
        "* Chollet, F. (2016, May 14). Building Autoencoders in Keras. Retrieved March 19, 2019, from https://blog.keras.io/building-autoencoders-in-keras.html\n",
        "* Goodfellow, I., Bengio, Y., \u0026 Courville, A. (2016). Deep learning. MIT press."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "name": "autoencoder.ipynb",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true,
      "version": "0.3.2"
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
